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Abstract 

Wireless sensor networks place sensors into an area to collect data and send them back to a base 
station. Data fusion, which fuses the collected data before they are sent to the base station, is usually 
implemented over the network. Since the sensor is typically placed in locations accessible to malicious 
attackers, information assurance of the data fusion process is very important. A witness-based approach 
has been proposed to validate the fusion data. In this approach, the base station receives the fusion data 
and "votes" on the data from a randomly chosen sensor node. The vote comes from other sensor nodes, 
called "witnesses," to verify the correctness of the fusion data. Because the base station obtains the vote 
through the chosen node, the chosen node could forge the vote if it is compromised. Thus, the witness 
node must encrypt the vote to prevent this forgery. Compared with the vote, the encryption requires 
more bits, increasing transmission burden from the chosen node to the base station. The chosen node 
consumes more power. This work improves the witness-based approach using direct voting mechanism 
such that the proposed scheme has better performance in terms of assurance, overhead, and delay. The 
witness node transmits the vote directly to the base station. Forgery is not a problem in this scheme. 
Moreover, fewer bits are necessary to represent the vote, significantly reducing the power consumption. 
Performance analysis and simulation results indicate that the proposed approach can achieve a 40 times 
better overhead than the witness-based approach. 
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I. Introduction 

Wireless sensor networks (WSNs) comprise many tiny, low-cost, battery-powered sensors in 
a small area [1], [5], [1 1]— [13], [21], [22]. The sensors detect environmental variations and then 
transmit the detection results to other sensors or a base station [2], [4], [6], [7], [24]. One or 
several sensors then collect the detection results from other sensors. The collected data must be 
processed by the sensor to reduce the transmission burden before they are transmitted to the base 
station. This process is called data fusion and the sensor performing data fusion is the fusion 
node. The fusion data may be sent from the fusion node to the base station through multiple 
hops [10] or a direct link [19]. 

Although fusion significantly lowers the traffic between the fusion node and the base station, 
the fusion node is more critical and vulnerable to malicious attacks than non-fusion sensors [15], 
[16], [20]. If a fusion node is compromised, then the base station cannot ensure the correctness 
of the fusion data sent to it. This problem of fusion data assurance arises because the detection 
results are not sent directly to the base station, so the fusion result cannot usually be verified. 

This problem can be resolved in two ways: One is hardware-based [3], [14] and the other 
is software-based [8], [9], [17]. Since the hardware-based approach requires extra circuits to 
detect or frustrate the compromised node, the cost and continual power consumption of sensors 
are increased but still cannot guarantee protection against all attacks. Conversely, the software 
methods generally require no or little extra hardware for data assurance. However, as mentioned 
in [8], [17], several copies of the fusion data must be sent to the base station, the power 
consumption for the data transmission is very high. 

The witness-based approach presented by Du et al. [9] does not have this difficulty. Several 
fusion nodes are used to fuse the collected data and have the ability to communicate with the 
base station. Only one node is chosen to transmit the fusion result to the base station. The other 
fusion nodes, serving as witnesses, encrypt the fusion results to message authentication codes 
(MACs). The MACs are then sent to the base station through the chosen fusion node. Finally, the 
base station utilizes the received MACs to verify the received fusion data. The verification could 
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be wrong since the chosen node could be compromised and forge MACs. The correctness of the 
verification depends not only on the number of malicious fusion nodes, but also on the length of 
the MAC. A long MAC increases the reliability of the verification. However, the transmission of 
the long MAC imposes a high communication burden. If the received fusion result at the base 
station cannot pass the verification, then a polling scheme is started to determine whether any 
valid fusion result is available at the other fusion nodes. In addition to the fusion result sent 
by the malicious fusion node, several copies of the correct fusion result may also have to be 
transmitted to the base station. The transmission of the correct fusion result consumes the power 
of the uncompromised fusion node. 

This work develops a new data fusion assurance to improve the witness-based method of Du 
et al. [9]. The correctness of the verification in the proposed scheme depends only on the number 
of compromised fusion nodes. As in the witness-based approach, a fusion node is selected to 
transmit the fusion result, while other fusion nodes serve as witnesses. Nevertheless, the base 
station obtains votes contributing to the transmitted fusion result directly from the witness nodes. 
No valid fusion data are available if the transmitted fusion data are not approved by a pre-set 
number of witness nodes. Based on this voting mechanism, two schemes are addressed: one 
needs variant rounds of voting and the other needs only one round of voting. In the variant- 
round scheme, only one copy of the correct fusion data provided by one uncompromised fusion 
node is transmitted to the base station. Analytical and simulation results reveal that the proposed 
scheme is up to 40 times better on the overhead than that of Du et al. [9]. In the one-round 
scheme, the base station polls each sensor once at most. The maximum delay of the one-round 
scheme is much less than the variant-round scheme. 

The remainder of this work is organized as follows. Section |n] briefly addresses the problem 
of data fusion assurance in WSNs and previous works on the problem. Section [Till describes 
the variant-round scheme and analyzes its performance in terms of overhead and delay. The 
description of the one-round scheme is also given. Section [IV] presents a performance evaluation 
of the proposed approach. Concluding remarks and suggestions for future work are presented in 
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Fig. 1. Structure of a wireless sensor network for distributed detection using N sensors and M fusion nodes 

Section [V] 

II. Data Fusion Assurance Problem and The Previous Work 

Figure [T] depicts a wireless sensor network for distributed detection with iV sensors for collect- 
ing environment variation data, and a fusion center for making a final decision of detections. This 
network architecture is similar to the so-called SENsor with Mobile Access (SENMA) [23], [28], 
Message Ferry [29], and Data Mule [18]. At the jth sensor, one observation yj is undertaken for 
one of phenomena where i = 1,2,. . . , L. If the detection (raw) data are transmitted to the 
fusion nodes without any processing, then the transmission imposes a very high communication 
burden. Hence, each sensor must make a local decision based on the raw data before transmission. 
iJl Vj, j = 1, 



The decisions 



2, . . . , N, can be represented with fewer symbols than the raw data. 



The sensor then transmits the local decision to M fusion nodes using broadcast. The fusion node 
can combine all of the local decisions to yield a final result, and directly communicate with the 
base station. Finally, one of the fusion nodes is specified to send the final result to the base 

'These decisions could be compressed data whose sizes depend on the applications of the WSNs. 
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station. Unless all of the fusion nodes or all of the sensors fail, this detection and fusion scheme 
can guarantee that the base station can obtain the detection result. However, the accuracy of the 
result is not certain. 

Two problems must be solved to ensure that the base station obtain the correct result. First, 
every fusion node must correctly fuse all of the local decisions, which also implies that all 
of the fusion results must be the same. Several algorithms have been proposed to deal with 
this problem [7], [25]-[27]. This work assumes that this problem has been solved. The second 
problem concerns assurance of the fusion result. The transmission between the fusion node and 
the base station is assumed herein to be error-free. Since some fusion nodes may be compromised, 
the fusion node chosen by the base station to transmit the fusion result may be one of the 
compromised nodes. Malicious data may be sent by the compromised node, and the base station 
cannot discover the compromised nodes from the normal fusion nodes since the data detected 
by the sensor are not sent directly to the base station. Consequently, the result obtained at the 
base station may be incorrect. 

Du et al. [9] presented a witness-based approach to ensure the correctness of the fusion result. 
All fusion nodes, other than the chosen node, acts as witnesses of the transmitted fusion result. 
The witness nodes encrypt their own fusion results to MACs with private keys shared with the 
base station, and send the MACs, as "votes", to the chosen node. In the T + 1 out of M voting 
scheme, the chosen node collects all MACs from the witness nodes, and transmits them with its 
own fusion result to the base station. The base station can determine the received data whether 
the fusion result from the chosen node is accurate. If the fusion result is supported by at least T 
MACs, then the base station accepts the fusion result. Normally, T > [M/2\. However, although 
the number of compromised nodes C < T, the accepted fusion result is not always correct. If 
the chosen node is compromised, then it may forge the fusion result and the MAC. Let the size 
of each MAC be k w (bits). Since the number of the transmitted MACs is M — 1, the number of 
the transmitted bits, in addition to the fusion result, is (M — l)k w . The probability that the base 
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station accepts the forged fusion result is given by 

Pe= s( i ) (i) 

For instance, consider the majority voting rule in which T+l > \M/2\ . To ensure that P e < 2~ 10 , 
set k w = \2 (10/ (M — 1) + l)] • Additionally, although only one copy of the fusion result is sent 
to the base station in this witness-based approach, the witness nodes still requires significant 
communication bandwidth because the MACs of the fusion results are transmitted. 

To compare the proposed scheme fairly with the witness-based approach, the overhead is 
defined as the total number of bits, except the bits for one copy of the correct fusion result, 
transmitted to the base station by uncompromised nodes during the data assurance process^ 



The overhead can therefore be regarded as the useful power consumed for the data assurance. 
Moreover, the round delay is defined as the number of roundsP required to collect all MACs 
(votes) from the witness nodes and the polling delay the number of votes (including all agree 
and disagree voting) Q 

The overhead of the witness-based approach given in [9] is then derived as follows. If the 
received fusion result is not accepted, then the base station may start a polling mechanism to 
seek the correct fusion result. The base station randomly specifies another fusion node. The 
new chosen node then sends its fusion result and all MACs from the witness nodes to the base 
tion@ 

the fusion result is invalid and M — T fusion nodes are chosen to transmit their 



stationjj When the number of compromised fusion nodes, C, is greater than M — T — 1 but less 



than T + 1 

fusion results to the base station in the polling process. Since the overhead defined only considers 

2 The method does not consider the power consumption of all compromised nodes since they are not useful to the WSN. 

3 Or the number of fusion results sent to the base station. 

4 The overall time delay is then can be derived by these two delays. 

5 A11 MACs must be sent to the base station again to avoid the denial of service since the previously chosen compromised 
fusion node might modify MACs it sent to the base station. This action is not clearly presented in [9]. 

s The C compromised nodes are assumed to collude to forge a wrong fusion result. Hence, if C > T + 1, then they can 
successfully forge a wrong fusion result, and the base station accepts the forged result. 
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the power consumption of uncompromised nodes, the number of uncompromised nodes among 
the M — T fusion nodes must be discovered. The probability that % of the M — T fusion nodes 
are uncompromised is given by 



where < i < M — T. Let K be the number of bits representing the fusion result. The average 
overhead is thus 



where i — 1 is used because the overhead is defined such that one copy of correct fusion 
result does not count on. Equation (OQ) indicates that the number of the correct fusion results 
transmitted by the uncompromised fusion nodes may be up to M — T. In other words, the power 
of the uncompromised nodes is significantly wasted. Moreover, because each chosen node has 
to collects all MACs from the witness nodes, the average round delay, R w , is M — T and the 
average polling delay D w is (M - T)(M - 1). 

Conversely, when the number of uncompromised nodes is greater than T, the base station can 
obtain the correct fusion result. If the base station gets the correction result at round i, meaning 
that the chosen fusion nodes from round 1 to % — 1 are compromised, then the average round 




M-T 



O w {M, T,C)=J2 P - W l(M - l)ik w + K{i - 1)] (bits) 



(1) 



i=l 
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delay can be given by 



c+i 



C-i + 1 



M-% 



) 



i=i 




and the average polling delay is R W (M — 1). The average overhead is given by (M — l)k w . 
Notice that the maximum round delay is C + 1. When the fusion result is valid, 40 and 60 bits 
must be transmitted to the base station when M = 11 (k w = 4, i.e., P e < 2 -10 ) and M — 21 
(k w = 3, i.e., P e < 2~ 10 ), respectively, setting K = (in practice, if > 0). A large amount of 
power must be consumed for this transmission, significantly reducing the lifetime of the fusion 
node. The problem of power consumption is even worse when the fusion result is invalid. For 
example, the maximum average overhead is about 109 and 314 bits for M — 11 (C = 5 and 
T = 6) and M — 21 (C = 10 and T = 11), respectively. Therefore, the witness-based approach 
must be enhanced. 



The voting mechanism in the witness-based approach is designed according to the MAC of the 
fusion result at each witness node. This design is reasonable when the witness node does not know 
about the fusion result at the chosen node. However, in practice, the base station can transmit the 
fusion result of the chosen node to the witness or the witness node is in the communication range 
of the chosen node and the base station. Therefore, the witness node can obtain the transmitted 
fusion result from the chosen node through the base station or overhearing. The witness node 
then can compare the transmitted fusion result with its own fusion result. Finally, the witness 
node can send its vote (agreement or disagreement) on the transmitted result directly to the base 
station, rather than through the chosen node. 

The base station has to set up a group key for all fusion nodes to ensure that the direct voting 



III. Improved Voting Mechanism 



February 1, 2008 



DRAFT 



mechanism works |_| When a fusion node wishes to send its fusion result to the base station, it 
adopts the group key to encrypt the result, and other fusion nodes serving as witness nodes can 
decode the encrypted result. The witness node then starts to vote on the transmitted result. Two 
data fusion assurance schemes are proposed based on the voting mechanism using a group key. 

A. Variant-round Scheme 

In this scheme, the base station needs to ask the witness node whether it agrees or disagrees 
with the transmitted fusion result. The witness node then sends its vote to the base station. If 
the transmitted fusion result is not supported by at least T witness nodes, then the base station 
might have to select a witness node that does not agree with the transmitted result as the next 
chosen node. The detail steps of the scheme are given as follows: 

Step 1: The base station chooses a fusion node. Other fusion nodes serve as witness nodes. 
Define a set of witness nodes that includes all witness nodes and let the nodes in the 
set be randomly ordered. Denote M' = M — 1 as the size of the witness set in the 
current round. 

Step 2: The chosen node transmits its fusion result to the base station. 
Step 3: The base station polls the node in the witness set by following the order of the witness 
nodes. The polling-for-votej process does not stop until 

• T witness nodes agree with the transmitted fusion result (agreeing nodes), where 
1 < T < M - 1, 

• M' — T + 1 witness nodes disagree with the transmitted fusion result (disagreeing 
nodes), or 

• all witness nodes have been polled. 

7 We assume that all witnesses can overhear the fusion result sent by the chosen node. If it is not the case, the direct voting 
mechanism needs to be slightly modified. 

8 The rest of this work utilizes "voting", "polling" and "polling-for-vote" interchangeably. 
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Step 4: Represent A as the number of witness nodes that agree with the transmitted fusion 
result. 

• If A = T, then the transmitted fusion result passes the verification of the fusion 
result. Stop the polling. 

. If M' — T — 1 < A < T, then no reliable fusion result is valid. Stop the polling. 

• If A < M' — T — 1, then exclude the A agreeing witness nodes from the witness 
set. Let the first node that disagrees with the transmitted fusion result be the chosen 
node to transmit its fusion result. Thus, the updated size of the witness set, M', 
becomes to M' — A — J^. Go to Step 2 for the next round of the polling. 

B. Analysis of the Variant-round Scheme 

This analysis assumes that the compromised node always transmits the forged fusion result 
while the compromised node is chosen to send its fusion result. When a compromised node serves 
as a witness node, it always disagrees with the correct fusion result, and agrees with the forged 
fusion result with a probability Pf. If the compromised node attempts to make the base station 
accept the forged fusion result, then it always agrees with the fusion result transmitted by other 
compromised nodes, i.e., Pf = 1, and at most two rounds of voting have to be run. Conversely, 
if the compromised node wants to make the polling-for-vote process run as long as possible, 
then it always disagrees with the transmitted fusion result, i.e., Pf = 0. The performance of 
the variant-round scheme when Pf = is analyzed next. Appendix describes the performance 
analysis of the variant-round scheme when Pf = 1. When Pf ^ 0, 1, the overheads are presented 
by computer simulations in next section. Since all compromised nodes (uncompromised nodes) 
have the same behavior in the following analysis, the analysis can be treated as the problem 
of counting for C black balls (compromised nodes) and M — C white balls (uncompromised 
nodes) together. 

9 The number of nodes performing the polling-for-vote at next round becomes to M' — A. 
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If the compromised node always disagrees with the transmitted fusion result, then no forged 
fusion result is accepted. Two cases must be addressed: 

Case 1 C>M-T, 

Case 2 C < M — T. 
Note that the valid fusion result is not available in Case 1. 

Assume that the chosen node at the first round is compromised. The probability that the chosen 
node is compromised at the first round is given by C/M. The first-round voting finishes when 
M — T witness nodes do not agree with the transmitted fusion result, as described in Step 3. 
Thus, the polling order (i.e., the order of witness nodes as described in Step 1) determines the 
number of witness nodes that the base station has to poll at this round of voting. The number 
of possible polling orders in the sense of the black-white-ball model is given by 

nc i (M-iy. _( M-i 

vl {C-l)\{M-C)\ \C-l 
where the subscript, vl, denotes the first case of the variant-round scheme, and the superscript, 
cl, represents the first round of voting when the chosen node is compromised. Since the chosen 
node at the first-round voting is compromised, the polling stops after M — T witness nodes 
have been polled. Moreover, the number of unpolled nodes is M — 1 — (M — T) = T — 1, 
and the number of uncompromised nodes among the unpolled nodes is M — C — i if there 
are % uncompromised nodes polled. Thus, the probability that i of the M — T polled nodes are 
uncompromised, where < i < M — T^j is then written by 

1 ( M-t\ ( T - 1 

No node is excluded from the witness set. The number of compromised nodes and the size of the 
witness set at the second round become C — 1 and M — 2, respectively. The average overhead, 

'"Actually, max{M —T — C + 1,0} < i. Since (?) is when b > a, is adopted as the lower bound of i. 
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when the chosen node at the first round is compromised, is expressed recursively by 

^ 1 / M - T \ ( T - 1 \ 

O c vl (M,T,C,0)=J2wi\ UK + O vl (M -1,T,C -1,Q), &) 

where O v i(M — 1, T, C — 1, 0) represents the average overhead when the number of fusion nodes 
is M — 1 and the number of compromised nodes is C — 1, and A; (&;') is the number of bits 
that a witness must send to the base station while it agrees (disagrees) with the transmitted 
fusion resultl^] Moreover, the average round delay and the average polling delay under the same 
condition are represented by 

R c ol (M, T, C, 0) = l + R vl (M-l,T,C-l,0), (3) 

and 

D c vl (M, T, C, 0) = M - T + D vl (M -1,T,C — 1,0), (4) 

where R V \(M — 1, T, C — 1, 0) and D vl (M — 1, T, C — 1, 0) are the average round delay and 
polling delay, respectively, when the number of fusion nodes performing the polling-for-vote is 
M — 1 and the number of compromised nodes among them is C — 1. 

Next, suppose that the chosen node at the first round is not compromised, which has a 
probability of (M — C)/M. The number of the possible polling orders is given by 

M-l 

C 

where the superscript, ul, denotes the first round of polling while the chosen node is uncompro- 
mised. When the polling stops at witness node j, the node does not agree with the transmitted 
result and the base station has polled M — T disagreeing nodes (including witness node j). 
Moreover, M — j — 1 nodes are unpolled, and C — (M — T) = T + C — M of these are 
compromised. Since the witness set has M — C—l uncompromised nodes, the maximum number 

"The bits sent to the base station when a node agrees with the fusion result are separated from those sent when the node 
disagrees with the result, since the node can be silenced when it agrees with the result, while only a few bits are sent when it 
disagrees. 



TT«1 

LL vl 
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of polled witness nodes is (M — C — 1) + (M — T). Thus, the probability that the polling stops 
at the jth witness node, where M - T < j < (M - C - 1) + (M - T) = 2M - T - C - 1, is 
given by 

P . H , ) = J-( i- 1 \ ( M-i- 1 
" lW n- ^ M _ T _! j [t + c-m 

Since the number of unpolled nodes M — j — 1 is less than T, these nodes will never be polled 
in the following runs before the voting mechanism stops. Hence, only compromised nodes are 
polled after the first round and then no uncompromised nodes are further polled. Note that the 
number of uncompromised nodes among the polled nodes is j — (M— T) = j — M+T. Therefore, 
the average overhead, when the chosen node at the first round is not compromised, is then given 
by 

2M-T-C-1 

O u vl (M, T, C,Q)= P vi(j)(j — M + T)k. (5) 

j=M-T 

Moreover, the size of the witness set after the first round becomes M' = (M — j — 1) + (M — 
T) — 1 = 2M — T — j — 2 (the total number of unpolled and disagreeing nodes minus 1). The 
polling process stops if 

M' < T - 1 

2M-T-j-2<T-l 
j>2M -2T-1. 

Otherwise, the size is decreased by 1 in each following round until it becomes T. Consequently, 
the total number of the following rounds is 2M - T - j - 2 - T + 1 = 2M - 2T - j - 1 and 
then the number of the total rounds is 2M — 2T — j — 1 + 1 = 2M — 2T — j. The average round 
delay is then represented by 

2M-2T-2 2M-T-C-1 

R u vl (M,T,C,0)= Yl P£U)&M-2T-j)+ £ P£{j). (6) 

j=M—T j=2M-2T-\ 
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Since the voting process will stop when M' — T + 1 nodes are polled at each round, the polling 
delay is 



2M-T-C-1 



D%i(M, T, C, 0) = Yl P ^)j 



j=M-T 
2M-2T-2 



+ y~ p ui {j) (2M -IT- j)(2M - 2T-j-l) (?) 

i=Af-T 

Equations © to ©, and the initial conditions then give the average overhead and the average 



delays of Case 1 as 

O vl (M,T,C,0) 

R vl {M,T, C,0) 
D vl (M,T, C, 0) 



M <T 

§O c vl (M,T,C,0) + M^ O ^(M,T,C,0) else 

M <T 

%R c vl (M, T, C, 0) + (M, T, C, 0) else 

M <T 

§D^(M, T, C, 0) + ^lD«(M, T, C, 0) else 



The second case, i.e., C < M — T, produces a valid fusion result. Similarly, if the chosen node 
at the first-round polling is compromised, then the polling stops after M — T witness nodes are 
polled. The average overhead and the average delays, when the chosen node at the first round 
is compromised, are expressed respectively as 

1 / m-t \ ( T - 1 \ 

O c v2 (M,T,C,0)= V — J \ik' + O v2 (M-l,T,C-l, 

i=M 6^c + i U ^ \ * J \ M-C-i J 
R c v2 (M, T, C, 0) = 1 + R v2 (M - 1, T, C - 1, 0), 

D c v2 (M, T, C, 0) = M - T + D v2 (M - 1, T, C - 1, 0). 

Only one round of polling is needed when the chosen node is uncompromised at the first round. 
When the polling stops at witness node j, the node agrees with the transmitted result and the 
base station has polled T agreeing nodes (including witness node j). Moreover, M — j — 1 nodes 
are unpolled, and M — 1 — C — T of these are uncompromised. The probability that the polling 
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process ends at the jth witness node, where T < j < T + C, is given by 



J -I 



n^i t-1 




The number of polled uncompromised nodes is T. The average overhead when the chosen node 
is uncompromised at the first round is represented as 

T+C 

OUM,T,C,0) = Y, P £^ Tk > 
and the average polling delay is 

T+C 

D u v2 (M,T,C,0) = J2 p :2 1 U)j- 

Consequently, the average overhead O v2 (M, T, C, 0), the average round delay R v2 (M, T, C, 0), 
and the average polling delay D v2 (M, T, C, 0) can be represented as 



O v2 (M,T, C,0) 





Tk 



M < T 

M > T and C = 



c 7 O c v2 (M, T, C7, 0) + ^0« 2 (M, T, C, 0) else 



R v2 {M,T, C,0) 




1 



M <T 

M > T and C = 



D v2 (M,T, C, 0) 



§^ 2 (M,T,C,0) + M^ else 




T 



M < T 

M > T and C = 



§^ 2 (M,T,C,0) + ^^ 2 (M,T,C,0) else 



An interesting property of the variant-round scheme is that throughout the polling process, at 
most one fusion result is transmitted from all of the uncompromised nodes to the base station. 
Hence, the overhead of the scheme is independent of the size of the fusion result^ This claim 
can be proven by the following argument. In the case of a valid fusion result, when the fusion 
result from an uncompromised node is sent to the base station at a round of polling, the polling 



"Note that the overhead is defined as the total number of bits transmitted minus the bits of the correct fusion result. 
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process stops at this round and the valid fusion result is obtained by the base station. Accordingly, 
only one valid fusion result is sent to the base station by all uncompromised nodes. In the case 
of no valid fusion result, the number of uncompromised nodes as witnesses is less than T. If a 
round of the polling process is the first time that the chosen node is uncompromised, this round 
terminates when either all witness nodes have been polled or M' — T + 1 witness nodes disagree 
with the transmitted fusion result. In the former case, the polling process stops. In the latter 
case, another polling round is required. Importantly, in the next round all uncompromised nodes 
will be the last T — 1 nodes in the witness set and will not then be chosen to send any fusion 

u 

result before the polling process is completed^ Therefore, only one fusion result is sent by all 
uncompromised nodes when no valid result can be obtained by the base station. 

C. One-round Scheme 

The number of rounds in the above scheme is not fixed. Hence, the delay varies. Variant 
delay is not desired in some applications such as real-time systems. This work proposes another 
scheme that is based on the improved voting mechanism. In this scheme the base station may 
receive different fusion results from the witness nodes. It requires that all received fusion results 
be stored. This scheme has a fixed delay and is summarized as follows: 

Step 1: The base station randomly chooses a fusion node. Other fusion nodes serve as witness 
nodes. Define a set of witness nodes that includes all of the witness nodes and let the 
nodes in the set be randomly ordered. 
Step 2: The chosen node transmits its fusion result to the base station. The base station sets 
the fusion result as a potential voting result and the number of agreeing votes for the 
fusion result is set to be zero. 
Step 3: The base station polls the nodes, with the voting result, one by one in the witness set, 
following the order of the witness nodes. The witness node compares its fusion result 
with the voting result. 

13 In the next round, all uncompromised polled nodes are deleted from the witness set according to the scheme. 
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• If the witness node agrees with the voting result, it sends an agreeing vote to the 
base station. The base station increases the number of agreeing votes for the voting 
result by one. 

• If the witness node does not agree with the voting result, it transmits its fusion 
result to the base station. 

- If the fusion result has been stored in the base station, then the base station 
increases the number of agreeing votes for the fusion result by one. 

- If the fusion result has not been stored in the base station, then the base station 
stores the fusion result and the number of agreeing votes for the fusion result 
is set to be zero. 

The base station sets the voting result to the received fusion result with the maximum 
number of agreeing votes to poll the next witness node. If two or more fusion results 
received the maximum umber of votes, then the voting result is set to the result with 
the most recent vote. The polling stops when any received fusion result receives T 
votes or when the number of unpolled nodes plus the maximum number of votes for 
the results recorded at the base station is less than T. 

IV. Performance Evaluation 

In this section numerical and computer simulations are performed to evaluate the performance 
of the proposed schemes. The performances of the proposed variant-round scheme are numeri- 
cally calculated by the results given in Section Hill when Pf — and 1. The performances of the 
variant-round scheme when Pf ^ or 1 and that of the one-round scheme are evaluated using 
Monte Carlo computer simulations. The proposed schemes are compared using the witness-based 
approach in terms of overhead, average round delay, and average polling delay performances. 
For the witness-based approach, the size of each MAC, k w , is assumed to be four bits. In 
the evaluation of the overhead of the variant-round scheme, the size of fusion result is zero 
(i.e., K = 0), such that the witness-based approach has the best performance. As stated at the 
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end of Subsection IIII-Bl the overhead of the variant-round scheme is independent of the size 
of the fusion result. Under this assumption, the variant-round scheme is inferred to have better 
overhead performance than the witness-based approach for all sizes of the fusion result whenever 
it outperforms with a zero-sized the fusion result. All results are presented for number of nodes 
M = 11 

Figure [2] shows the overheads of the variant-round scheme when Pf = 0. As in the example 
in Section UH k, k' = 4 is set first and then set k — 1, k' — and k = 0, k' = 1 are considered. 
This figure indicates that setting k = 1 and k! = significantly reduces the overhead of the 
proposed scheme, independently of the fusion result at the base station. Setting k = and k' = 1 
further reduces the overhead of the proposed scheme. Next, the variant-round (VR) scheme is 
compared with the witness-based approach. In the variant-round scheme, k = and k' = 1 
are set. Figure [3] compares overheads. The variant-round scheme significantly outperforms the 
witness-based approach regardless of the fusion result at the base station. For example, according 
to Fig. [3l when T = 5 and M — 11 the variant-round scheme is almost 40 times better than 
the witness-based approach given in [9] in terms of overhead performance for C = 1 and 2. 
Figure 0] compares the average round delays of the proposed variant-round scheme and the 
witness-based approach. This figure demonstrates that the average round delays of the proposed 
scheme are smaller than those of the witness-based approach when the base station can obtain the 
valid fusion results; however, they perform equally when the base station obtains invalid fusion 
results. Figure \5\ compares the average polling delays of the proposed variant-round scheme and 
the witness-based approach. In Fig. [51 the proposed scheme has much smaller average polling 
delays than the witness-based approach - unlike average round delay performance. For example, 
when T = 10 and M = 11, the proposed variant-round scheme is almost five times better than 
the witness-based approach in terms of average polling delay performance for C = 4 and 5. 
Accordingly, the proposed scheme outperforms the witness-based approach given in [9] in terms 

14 Similar results can be obtained for M = 21, but omitted because of the page limit. 
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TABLE I 

Maximum average overheads (bits) for variant-round (VR) scheme and witness-based approach given 

in [9] 





Witness-based 


VR, P f =0 


VR, P f = 0.25 


VR, P f = 0.5 


VR, P f = 0.75 


VR, Pf = 1 


M — 11 


109 


2.5 


2.5 


2.4 


2.7 


3.2 


M = 21 


314 


4.9 


4.7 


4.7 


4.9 


6.2 



of overhead and delay. 

The following computer simulations evaluate the variant-round scheme when Pf = 0.25, 0.5, 
and 0.75 by performing 10000 Monte Carlo tests for each simulation. In the first set of simu- 
lations, the witness-based approach given in [9] is compared with the proposed variant-round 
scheme when Pf = 0.25,0.5, and 0.75. Figure [6] presents the results for M — 11, k — and 
k' — 1. The variant-round scheme outperforms the witness-based approach in every Pf simulated. 
For example, in Fig.[6l when T = 5 the proposed variant-round scheme is almost 40 times better 
than the witness-based approach given in [9] in terms of overhead performance for C = 1,2 
and Pf = 0.25,0.5,0.75. Finally, Table U summarizes the maximum average overhead of the 
variant-round scheme and the witness-based approach for M = 11 and 12. 

In the second set of simulations, the average numbers of bits sent by uncompromised nodes in 
the one-round scheme is evaluated. When the compromised node does not agree with the voting 
result, the fusion result transmitted by the compromised node is different from other fusion 
results and the size of the fusion result is K = 48. Fig. [7] shows the results for M = 11 when 
Pf = 0, 0.5, and 1. In Fig. |7£a), when the base station can obtain a valid fusion result, the bits 
transmitted by the uncompromised nodes to the base station in the one-round scheme increase 
with the number of compromised nodes C, as expected. However, they are smaller than those 
in the witness-based approach. Notably, in this case, the bits transmitted by uncompromised 
nodes in the witness-based approach is constant, since once an uncompromised node is polled, 
the polling process is completed. For small C, such as C = 0, 1,2 or 3, the number of bits 
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Fig. 2. The overheads of the variant-round scheme, for M = 11 and Py = 0, (a) when T = 5 (valid fusion result) and (b) 
when T = 10 (invalid fusion result). 
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Fig. 3. Overhead comparison between the variant-round (VR) scheme and the witness-based approach given in [9], for M = 11 
and Pf = 1,0, (a) when T — 5 (valid fusion result) and (b) when T = 10 (invalid fusion result). 
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Fig. 4. Average round delay comparison between the variant-round (VR) scheme and the witness-based approach given in [9], 
for M = 11 and Pf — 1, 0, (a) when T = 5 (valid fusion result) and (b) when T = 10 (invalid fusion result). 
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Fig. 5. Average polling delay comparison between the variant-round (VR) scheme and the witness-based approach given in [9], 
for M = 11 and Pj — 1, 0, (a) when T = 5 (valid fusion result) and (b) when T = 10 (invalid fusion result). 
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transmitted by uncompromised nodes in the one-round scheme is about half that of those in the 
witness-based approach. Additionally, the performance of the one-round scheme when Pf = 1 is 
the worst in all three simulations, since any compromised node that agrees with the forged result 
will sometimes make the forged result with the largest number of votes and force the base station 
to use it as the temporary voting result to poll the next node. Then, the next uncompromised 
node needs to transmit its fusion result to the base station instead of only sending a agreeing 
vote and the total number of bits transmitted by the uncompromised nodes increases. 

According to Fig. |7£b), when the base station cannot obtain a valid fusion result, the number 
of bits transmitted by the uncompromised nodes to the base station in the one-round scheme 
decreases with the number of compromised nodes C, except for Pf = 1. This phenomenon is 
caused by the fact that the scheme stops when the number of unpolled nodes plus the maximum 
number of votes for any result recorded at the base station is less than T. When T = M — 1 as 
simulated, the recoding of two different results at the base station stops the polling process. Recall 
that when Pf — 1, the only way to stop the polling process is for one fusion result to be sent by 
an uncompromised node and the other to be sent by a compromised node, and for the transmitted 
bits of the uncompromised nodes to be the same for all C . If Pf ^ 1, the two compromised 
nodes may yield two different results, and no bit is transmitted by the uncompromised node. This 
concludes the simulation results. This sub figure reveals that the one-round scheme outperforms 
when C is small but loses when C exceeds T/2. Similarly, the maximum average number of 
bits in the one-round scheme is compared with that of the witness-based approach for M = 11 
and 12, and the results are given in Table III This table indicates that the maximum average 
number of bits transmitted in the one-round scheme is much lower than that in the witness-based 
approach. 

V. Conclusions and Future Work 

This work proposes a power-efficient scheme for data fusion assurance, in which the base 
station in the wireless sensor network collects the fusion data and the votes on the data directly 
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Fig. 6. Overhead comparison between the variant-round (VR) scheme and the witness-based approach given in [9], for M = 11 
and Pf = 0.25, 0.5, 0.75, (a) when T = 5 (valid fusion result) and (b) when T = 10 (invalid fusion result). 
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Fig. 7. Overhead comparison between the one-round (OR) scheme and the witness-based approach given in [9], for M = 11 
and Pf = 0, 0.5, 1, (a) when T = 5 (valid fusion result) and (b) when T = 10 (invalid fusion result). 
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TABLE II 

Maximum average overheads (bits) for one-round (OR) scheme and witness-based approach given in [9] 





Witness-based 


OR, P f = 


OR, P f = 0.5 


OR, P f = 1 


M = 11 


240 


71.5 


86.7 


128.6 


M — 21 


566 


76.1 


106.4 


205.7 



from the fusion nodes. The proposed scheme is more reliable with less assurance overhead and 
delay than the witness-based approach. That is, the power and delay for the transmission of the 
fusion result and the votes are significantly decreased. 

In the future, we will discover the performance when a node is compromised with some 
probability, both statistically dependent and independent to other nodes. Moreover, the propose 
scheme cannot be applied to multi-hop WSNs. We will develop other schemes based on the 
direct voting mechanism for the multi-hop WSN. 

Appendix 

Performance Analysis of the Variant-round Scheme when Pf = 1 

In this analysis we assume that 2T + 1 > M. Since when C > T the base station will get a 
forged fusion result and this should be avoided, we are only considering two cases: 
Case 1 M — T < C < T, 
Case 2 C < M - T. 

Case 1 does not produce a valid fusion result. Assume that the chosen node at the first round 
of polling is compromised. The probability that the chosen node is compromised at the first 
round is given by C/M. The first-round polling-for-vote process finishes when M — T witness 
nodes do not agree with the transmitted fusion result, as described in Step 3. When the polling 
stops at the ith witness node, the ith witness node (uncompromised) does not agree with the 
transmitted result and the base station has polled M — T disagreeing nodes (including the ith 
witness node). Moreover, M — i — 1 nodes are unpolled, and M — C — (M — T) = T — C of them 
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are uncompromised. Since the witness set contains C — 1 compromised nodes, the maximum 
number of polled witness nodes is (C - 1) + (M - T) = M - T + C - 1. Hence, the probability 
that the polling stops at the ith witness node, where M — T < i < M — T + C — 1, is given by 



The number of agreeing nodes, A, equals to i — (M — T). The first witness node that disagreed 
with the transmitted fusion result at the first round becomes the new chosen node at the second 
round as stated in Step 4. The size of the witness set becomes M' = (M — l) — (i — M + T) — l = 
2M - T - i - 2. If M' < T, which is equivalent to (M - 1) - T - 1 < A, then the polling 
stops as described in the second part of Step 4. Otherwise, since the previous chosen fusion 
node is compromised, the current chosen fusion node is not compromised. At the second round, 
the witness set now has C = C — 1 — (i — (M — T)) = M + C — T — i — 1 compromised nodes 
and M — C — 1 uncompromised nodes. As implied in Step 3, the base station first polls the 
M — T — 1 witness nodes that disagreed with the previous transmitted fusion result in the first 
round. The witness set contains M — i — 1 unpolled nodes of which M — C — (M — T) = T — C 
are uncompromised. Consequently, the number of the possible polling orders is given by 



is a compromised node) does not agree with the transmitted result. The base station has polled 
M'-T+l disagreeing nodes (including the jth witness node) and j-(M'-T+l) = j-M'+T-l 
agreeing (uncompromised) nodes. The first M — T — 1 of the j polled nodes are uncompromised, 
and the rest j — (M — T — 1) — 1 nodes (including the jth witness node) may include the 
uncompromised or compromised nodes. Since the j polled nodes include M' — T + 1 disagreeing 
(compromised) nodes, the M' — j unpolled nodes include C — (M' — T + 1) compromised 
nodes. The probability that the second polling process stops at the jth witness node, where 
j > (AT - T + 1) + (M - T - 1) = M' + M - 2T and j < (AT - T + 1) + (M - C - 1) = 
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M — C + M' — T (disagreeing nodes and uncompromised nodes), is given by 

1 ( j-(M-T-l)-l \ ( M'-j 

M'-T I \ C'-(M' -T+l) 



TTC2 
LL vl 

1 



j-M + T 



n Si \ 2M-2T-i-2 




2M - T - i - j - 2 
T + C- M 



Since A is at least M — T — 1 at the second round, it is easy to see that M' — T — 1 < A and 



the scheme will stop after the second round. Therefore, the average overhead is 

M-T+C-l 

O c vl (M,T,C,l)= 



i=M—T 



M-C+M'-T 

(M-T)k'+ Pfi(j)(j-M' + T-l)k 

j=M'+M-2T 



(8) 



As mentioned before, when M' < T, i.e., 2M — 2T — 2 < i, the scheme will stop after the 



first round. Hence, the average round delay is 

2M-2T-2 M-T+C-l 

r: 1 (m,t,c,i)= 2P ?S)+ E ^(0. 



(9) 



=M-T 



i=2M-2T-l 



and the average polling delay is 

2M-2T-2 

D c vl (M,T,C,l)= E 

i=M—T 



M-C+M'-T 

*+ E ^ c i(j) 

j=M'+M-2T 



M-T+C-l 



+ e ^(0- do) 



j=2M~2T-l 



In the first case, the probability that the chosen node is not compromised at the first round is 
given by (M — C)/M. The number of the possible polling orders is written as 

M-l 

C 

Thus, the probability that the polling stops at the ith witness node, where M — T < i < 
(M - C - 1) + (M - T) = 2M - T - C - 1, is expressed as 



TTWl 



1 



% - 1 



m-t-1 



M - i - 1 
T + C- M 



At the second round the base station chooses the first witness nodes that disagreed with the 
transmitted fusion result sent in the previous round. The size of the witness set becomes M' = 
2M - T - i - 2. The witness set has M — C — (i — (M — T)) — 1 — 2M — C — T — i — 1 
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uncompromised nodes. Similarly, the polling stops if M' < T. Otherwise, since the previous 

chosen fusion node is not compromised, the current chosen fusion node is compromised. Thus, 

the number of the possible polling orders is given by 

M-i-l 
T + C-M 

When the second round of polling stops at the jth witness node, the j polled nodes contain 

M' — T + 1 disagreeing (uncompromised) nodes and the unpolled nodes include 2M — C — 

T — i — 1 — (M' — T + 1) uncompromised nodes. The probability that the second round of 

polling finishes at the jth witness node, where (M' - T+i) + (C-l) = M'-T + C>j> 

(M' - T + 1) + (M - T - 1) = M' + M - 2T, is given by 

1 / j-(M-T-l)-l \ I M'-j 

2M - C - T - i - 1 - (M' - T + 1) 



LL vl 
1 

TU2 



M'-T 



j-M + T 



Kl \ 2M-2T-1-2 




2M - T - % - j - 2 
T-C 



Similarly, it is easy to see that the scheme will stop after the second round. Therefore, the 



average overhead is given by 

2M-T-C-1 

0« 1 (M,T,C,1)= 



i=M—T 



M'-T+C 

M + T)k+ £ P^(j)(M'-T+ l)k' 

j=M'+M-2T 



(ID 



Again, when M' < T, i.e., 2M — 2T — 2 < i, the scheme will stop after the first round. Hence, 
the average round delay is 

2M-2T-2 2M-T-C-1 

R u vl (M,T,C,l)= 2/3 "i 1 «+ E ^W. (12) 



i=M —T 



i=2M-2T-\ 



and the average polling delay is 

2M-2T-2 

D u vl (M,T,C,l)= 



i=M—T 



M'-T+C 



j=M'+M-2T 



M-T+C-l 



(13) 



i=2M-2T-\ 



From ([8]) and (fT3l) . we have the average overhead and the average delays of Case 1 as 



O vl (M, T, C, 1) = —O c vl (M, T, C, 1) + O u vl (M, T, C, 1), 



(14) 
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and 



C M — C 

R vl (M, T, C, 1) = —R c vl (M, T, C, 1) + ~^Ki (M, T, C, 1), 



(7 a# _ (7 

A,x(M, T, C, 1) = —D c vl (M, T, C, 1) + -j^D u vl (M, T, C, 1). 



(15) 



(16) 



A valid fusion result is available in the second case. The first-round process is similar to the 
first-round process in the first case when the chosen node is compromised at the first round. The 
probability that the Tth witness node agrees with the transmitted fusion result at witness node 
j of the second-round polling process is 



1 / j-(M-T-l)-l \ I 2M -T -i-j -2 
n fi \ T - (M - T - 1) - 1 ) \ M - C - (T + 1) 



1 I j-M+T 
nil 2T-M 



2M -T -i- j -2 
M-C-T-l 



Significantly, if M = 2T + 1, then j = T. Otherwise, T<j<C-(i-(M-T))-l + T = 
M + C — % — 1. Therefore, the average overhead is expressed as 

E 1 =m- + t _1 t( M - T )^' + Tfc ] = (M - T)/u' + Tk M = 2T + 1 



£££S? _1 ^(0 [(M - T)k' + E^" 1 



O c v2 (M,T,C,l) = 
Hence, the average round delay is 

R c v2 (M,T,C,l)= Yl 
and the average polling delay is 

M- 

D c v2 (M,T,C,l)= P vl 



else 



.(17) 



M-T+C-l 



=M-T 



(18) 



M-T+C-l 



i=M—T 



M+C-i-1 



(19) 



Only one round of polling is needed when the chosen node is uncompromised at the first round. 
The probability that the polling process ends at the witness node i, where T < % < T + C, is 
written as 



i-l 



Kl T - 1 
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The average overhead, and the delays are given by 

T+C 

O^M, T, C, 1) = J2 P£(f)Tk, (20) 

i=T 

R: 2 (M,T,C,1) = 1, (21) 

and 

T+C 

D u v2 (M,T,C,l) = J2^:2^)- (22) 

i=T 

From (fTTT) and (|22|) . the average overhead, the delays in Case 2 are given by 

n m _ f 

O v2 (M, T, C, 1) = ^O c v2 {M, T, C, 1) + — ^— 0« (M, T, C, 1), 

C M — C 

R v2 (M, T, C, 1) = ^^ 2 (M, T, C, 1) + — ^— K 2 (M, T, C, 1), 

and 

^ 2 (M, T, C, 1) = ^^ C 2 (M, T, C, 1) + — ^ D« 2 (M, T, C, 1). 
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